RMarkdown is a powerful “literate programming” environment, meaning that one can easily combine text, code and output - including interactive htmlwidgets. There are three types of document that can be created:
All HTML documents (including reports and presentations) created with RMarkdown can be published freely (and publicly) to RPubs.com.
All of the content below uses the same code as from the cheats_data-viz.R file but demonstrates how different types of content can be combined. The data behind these examples was extrapolated and anonymised from the following Figshare deposit: https://doi.org/10.6084/m9.figshare.4516772.
The following packages are used in this document, and all data is from the data/cheats_journeys.csv file.
library("tidyverse")
library("leaflet")
library("highcharter")
library("sf")
library("statesRcontiguous")
library("lubridate")
journeys <- read_csv("data/cheats_journeys.csv")
The leaflet library allows us to create interactive maps, here are two common GIS visualsiations. To learn morw about GIS visualisations, please do visit the Oxshef Charts website.
scatter geo plots are very useful for simply demonstrating where events occured. In the map below circle markers are used to show the start and end locations in red and green, respectively.
journeys %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(
lng = ~ start.longitude,
lat = ~ start.latitude,
color = "red",
radius = 1
) %>%
addCircleMarkers(
lng = ~ end.longitude,
lat = ~ end.latitude,
color = "green",
radius = 1
)
The leaflet library allows us to easily visualise choropleth, provided we have appropriate shapefiles available. Here we prepare a shaepfile of the contiguous USA using the statesRcontiguous library:
contiguous_usa <- shp_all_us_states %>%
filter(contiguous.united.states == TRUE)
Using the sf library it is fairly easy to calculate the number of journeys that started in each State:
state_send_locs <- journeys %>%
filter(start.country == "USA") %>%
st_as_sf(
coords = c("start.longitude", "start.latitude"),
crs = st_crs(contiguous_usa)
)
contiguous_send_counts <- contiguous_usa %>%
mutate(send.counts = st_covers(contiguous_usa, state_send_locs) %>%
lengths())
Finally, we can create a palette and visualise this data:
palette_contiguous_us <-
colorBin("YlOrBr", domain = contiguous_send_counts$send.counts)
contiguous_send_counts %>%
leaflet() %>%
addPolygons(
fillColor = ~ palette_contiguous_us(send.counts),
fillOpacity = 1,
weight = 1,
color = "#000",
label = ~ paste0(state.name, " (", send.counts, " journeys)")
) %>%
addLegend(pal = palette_contiguous_us,
values = ~send.counts)
ggplot2 is an extraudinarily powerful and versatile tool for visualising data with R, implementing a consistent and complete “grammar of graphics”. As an example of how far you can go with ggplot2 this is a calendar heatmap showing the distribution of letters sent through the 1860s in the journeys dataset.
dated_journeys <- journeys %>%
select(date, number.of.letters) %>%
mutate(year = year(date),
yearmonthf = paste(month(date, label = TRUE), year(date)),
monthf = month(date, abbr = TRUE, label = TRUE),
week = week(date),
monthweek = ceiling(day(journeys$date) / 7),
weekdayf = wday(date,label = TRUE))
gg_dated_journeys <- dated_journeys %>%
filter(date >= dmy("01-01-1860") & date <= dmy("01-12-1869")) %>%
ggplot(aes(monthweek, weekdayf, fill = number.of.letters)) +
geom_tile(colour = "white") +
facet_grid(year~monthf) +
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Number of letters sent per day in the 1860s",
fill = "Number of letters")
gg_dated_journeys
The highcharter library allows us to create very professional looking interactive charts and plots. It’s important to note that the library is NOT free to use for commercial or governmental usage, though we can use it when communicating research outputs.
Both ggplot2 and highcharter use a very similar syntax:
ggplot(data, aes(x = x, y = y)): the aes function sets the aesthetics for the ggplot object, which columns in data should be used for which visual properties of the chart.hchart(data, hcaes(x = x, y = y)): the hcaes function sets the aesthetics for the highcharter object, which columns in data should be used for which visual properties of the chartIf you refer to the .Rmd file used to generate this report, you’ll notice the section header has {.tabset} appended. This allows us to create the tabbed content below from any child subheading of the current heading level.
country_to_country_counts <- journeys %>%
count(start.country, end.country) %>%
mutate(journey = paste(start.country, "->", end.country)) %>%
arrange(n) %>%
mutate(journey = as.factor(journey)) %>%
mutate(journey = fct_reorder(journey, n))
gg_country_to_country_counts <- country_to_country_counts %>%
ggplot(aes(x = journey, y = n)) + geom_col() +
coord_flip() +
xlab("") +
ylab("Number of journeys") +
ggtitle("Number of journeys split by start and end country")
gg_country_to_country_counts
journeys %>%
count(start.country, end.country) %>%
mutate(journey = paste(start.country, "->", end.country)) %>%
arrange(desc(n)) %>%
hchart(type = "bar",
hcaes(x = journey, y = n)) %>%
hc_xAxis(title = list(text = "")) %>%
hc_yAxis(title = list(text = "Number of journeys")) %>%
hc_title(text = "Number of journeys split by start and end country")
When constructing grouped or stacked bar charts in ggplot2 or highcharter one must ensure to reshape data into long format:
This is achieved with gather from the tidyr library.
end_country_tallies <- journeys %>%
group_by(end.country) %>%
summarise(total.letters = sum(number.of.letters),
total.journeys = n()) %>%
mutate(
total.letters = total.letters / sum(total.letters),
total.journeys = total.journeys / sum(total.journeys)
) %>%
arrange(total.letters) %>%
mutate(end.country = fct_reorder(end.country, total.letters)) %>%
gather(measure, value,-end.country)
end_country_tallies
## # A tibble: 12 x 3
## end.country measure value
## <fctr> <chr> <dbl>
## 1 POL total.letters 2.129971e-05
## 2 BEL total.letters 2.108671e-03
## 3 USA total.letters 2.268419e-02
## 4 GDR total.letters 1.583633e-01
## 5 DEU total.letters 1.875013e-01
## 6 GER total.letters 6.293212e-01
## 7 POL total.journeys 1.163467e-03
## 8 BEL total.journeys 5.235602e-03
## 9 USA total.journeys 9.482257e-02
## 10 GDR total.journeys 3.606748e-02
## 11 DEU total.journeys 3.030832e-01
## 12 GER total.journeys 5.596277e-01
end_country_tallies %>%
ggplot(aes(x = end.country,
y = value,
fill = measure)) +
geom_col(position = "dodge") +
coord_flip() +
xlab("Final Destination Country") +
ylab("Percentage") +
ggtitle("Final destination of letters",
subtitle = "Ordered by percentage of journeys") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(
values = c("#1b9e77", "#7570b3"),
name = "",
breaks = c("total.letters", "total.journeys"),
labels = c("Percentage of Journeys", "Percentage of letters")
)
end_country_tallies %>%
hchart(type = "bar",
hcaes(x = end.country,
y = 100 * value,
group = measure)) %>%
hc_xAxis(title = list(text = "Final Destination Country"),
reversed = FALSE) %>%
hc_yAxis(title = list(text = "Percentage"),
labels = list(format = '{value}%')) %>%
hc_title(text = "Final destination of letters") %>%
hc_subtitle(text = "Ordered by percentage of journeys")